CLAIMS 

What is claimed is: 

1 . A method comprising: 

receiving a first operand having a set of L data elements and a second operand 
having a set of L control elements; and 

for each control element, shuffling data from a first operand data element 
designated by said control element to an associated resultant data element position if 
its flush to zero field is not set and placing a zero into said associated resultant data 
element position if its flush to zero field is not set. 

2. The method of claim 1 wherein each of said L control elements occupies a 
particular position in said second operand and is associated with a similarly located data 
element position in a resultant. 

3 . The method of claim 2 wherein each of said L data elements occupies a particular 
position in said first operand. 

4. The method of claim 3 wherein said control element is to designate a first operand 
data element by a data element position number. 

5. The method of claim 4 wherein each of said control elements is comprised of: 
a flush to zero field, said flush to zero field to indicate whether a data element 

position associated with this control element is to be filled with a zero value; and 

a selection field, said selection field to indicate which first operand data element 
to shuffle data from. 

6. The method of claim 5 wherein each of said control elements is further comprised 
of a source select field. 
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7. The method of claim 2 further comprising outputting a resultant data block 
comprising data that was shuffled from said first operand in response to said control 
elements of said second operand. 

8. The method of claim 1 wherein each of said data elements comprises a byte of 
data. 

9. The method of claim 8 wherein each of said control elements is a byte wide. 

10. The method of claim 9 wherein L is 8 and wherein said first operand, said second 
operand, and said resultant are each comprised of 64-bit wide packed data. 

11. The method of claim 9 wherein L is 16 and wherein said first operand, said 
second operand, and said resultant are each comprised of 128-bit wide packed data. 

12. An apparatus comprising: 

an execution unit to execute a shuffle instruction including a first operand 
comprised of a set of L data elements and a second operand comprised of a set of L 
control elements, said shuffle instruction to cause said execution unit to: 

for each individual control element, determine whether its flush to zero 
field is set, and place a zero into an associated resultant data element position 
if true, otherwise shuffle data from a first operand data element designated by 
said individual control element to said associated resultant data element 
position. 

13. The apparatus of claim 12 wherein each of said L control elements occupies a 
position in said second operand and is associated with a similarly located data element 
position in a resultant. 

14. The apparatus of claim 13 wherein each individual control element is to designate 
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a first operand data element by a data element position number. 

15. The apparatus of claim 14 wherein each of said control elements is comprised of: 
a flush to zero field, said flush to zero field to indicate whether a data element 

position associated with this control element is to be filled with a zero value; and 

a selection field, said selection field to indicate which first operand data element 
to shuffle data from. 

16. The apparatus of claim 15 wherein each of said control elements is further 
comprised of a source select field. 

17. The apparatus of claim 16 wherein said shuffle instruction is to further cause said 
execution unit to generate a resultant having L data element positions that have been 
filled based on said set of L control elements. 

18. The apparatus of claim 12 wherein each of said data elements comprises a byte of 
data and each of said control elements is a byte wide. 

19. The apparatus of claim 1 8 wherein L is 8 wherein said first operand, said second 
operand, and said resultant are each comprised of 64-bit wide packed data. 

20. The apparatus of claim 18 wherein L is 16 and wherein said first operand, said 
second operand, and said resultant are each comprised of 128-bit wide packed data. 

21. An article comprising a machine readable medium that stores data representing a 
predetermined function comprising: 

receiving a first operand having a set of L data elements and a second operand 
having a set of L control elements; and 

for each control element, shuffling data from a first operand data element 
designated by said control element to an associated resultant data element position its 
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flush to zero field is not set and placing a zero into said associated resultant data 
element position its a flush to zero field is not set. 

22. The article of claim 21 wherein said data stored by sad machine readable medium 
represents an integrated circuit design, which when fabricated performs said 
predetermined function in response to a single instruction. 

23. The article of claim 22 wherein said predetermined function further comprises 
generating a resultant having L data element positions that been filled in accordance to 
said set of L control elements. 

24. The article of claim 23 wherein each of said L control elements is associated with 
a similarly located data element position in a resultant. 

25. The article of claim 24 wherein each individual control element is to designate a 
first operand data element by a data element position number. 

26. The article of claim 25 wherein each of said data elements comprises a byte of 
data. 

27. The article of claim 26 wherein each of said control elements is comprised of: 
a flush to zero field, said flush to zero field to indicate whether a data element 

position associated with this control element is to be filled with a zero value; and 

a selection field, said selection field to indicate which first operand data element 
to shuffle data from. 

28. The article of claim 27 wherein each of said control elements is further comprised 
of a source select field. 

29. The article of claim 21 wherein said data stored by said machine readable medium 
represents a computer instruction, which, if executed by a machine, causes said machine 
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to perform said predetermined function. 

30. A method comprising: 

receiving a first operand having a set of L data elements; 

receiving a second operand having a set of L masks, wherein each of said L masks 
occupies a particular position in said second operand and is associated with a 
similarly located data element position in a resultant, each of said L masks to include 
a flush to zero field; 

for each mask, determining whether its flush to zero field is set, and placing a 
zero into an associated resultant data element position if true; and 

if its flush to zero field is not set, shuffling data from a first operand data element 
designated by said mask to said associated resultant data element position. 

3 1 . The method of claim 30 wherein each of said L masks occupies a particular 
position in said second operand and is associated with a similarly located data element 
position in said resultant. 

32. The method of claim 31 wherein each of said L masks is comprised of: 

a flush to zero field, said flush to zero field to indicate whether a data element 
position associated with this control element is to be filled with a zero value; and 

a selection field, said selection field to indicate which first operand data element 
to shuffle data from. 

33. The method of claim 32 wherein each of said masks is further comprised of a 
source select field. 

34. The method of claim 33 wherein said first operand, said second operand, and said 
resultant are each comprised of 64-bit wide packed data. 
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35. The method of claim 33 wherein said first operand, said second operand, and said 
resultant are each comprised of 128-bit wide packed data. 

36. A method comprising: 

receiving a first operand having a set of L data elements; 

receiving a second operand having a set of L shuffle masks, each of said L shuffle 
masks associated with a similarly located data element position in a resultant; 

for each individual shuffle mask, determining whether its flush to zero field is set, 
and placing a zero into an associated resultant data element position if true, otherwise 
shuffling data from a first operand data element designated by said individual shuffle 
mask to said associated resultant data element position. 

37. The method of claim 36 wherein each of said L shuffle masks is comprised of: 
a flush to zero field, said flush to zero field to indicate whether a data element 

position associated with this control element is to be filled with a zero value; and 

a selection field, said selection field to indicate which first operand data element 
to shuffle data from. 

38. The method of claim 37 wherein each of said masks is further comprised of a 
source select field. 

39. An apparatus comprising: 

a first memory location to store a plurality of source data elements; 

a second memory location to store a plurality of control elements, each of said 
control elements to correspond to a resultant data element position, and each of said 
control elements to include a flush to zero field and a selection field; 

control logic coupled to said second memory location, said control logic in 
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response to values of said control elements to generate a plurality of selection signals 
and a plurality of flush to zero signals; 

a first plurality of multiplexers coupled to said first memory location and said 
plurality of selection signals, each of said first plurality of multiplexers to shuffle a 
data element for a specific resultant data element position in response to a selection 
signal corresponding to said specific resultant data element position; and 

a second plurality of multiplexers coupled to said first plurality of multiplexers 
and to said plurality of flush to zero signals, each of said second plurality of 
multiplexers associated with a specific resultant data element position, each of said 
second plurality of multiplexers to output a zero if its flush to zero signal is active or 
to output a data element shuffled for that specific resultant data element position. 

40. The apparatus of claim 39 wherein said plurality of source data elements is a first 
packed data operand. 

41 . The apparatus of claim 40 where said plurality of control elements is a second 
packed data operand. 

42. The apparatus of claim 40 wherein said first and second memory locations are a 
single instruction multiple data registers. 

43. The apparatus of claim 42 wherein: 

said first packed operand is 64 bits long and each of said source data elements is a 
byte wide; and 

said second packed operand is 64 bits long and each of said control elements is a 
byte wide. 

44. The apparatus of claim 42 wherein: 
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said first packed operand is 128 bits long and each of said source data elements is 
a byte wide; and 

said second packed operand is 128 bits long and each of said control elements is a 
byte wide. 

45. An apparatus comprising: 

control logic to receive a set of L shuffle masks, wherein each shuffle mask is 
associated with a unique resultant data element position, said control logic to provide 
a select signal and a flush to zero signal for each resultant data element position; 

a set of L multiplexers coupled to said control logic, wherein each multiplexer is 
also associated with a unique resultant data element position, each multiplexer to 
output a zero if its associated flush to zero signal is active and to output data shuffled 
from a set of M data elements based on its associated select signal if its associated 
flush to zero signal is inactive. 

46. The apparatus of claim 45 further comprising a register with L unique data 
element positions, each data element position to hold an output from its associated 
multiplexer. 

47. The apparatus of claim 46 wherein L is 16 and M is 16. 

48. A system comprising: 

a memory to store data and instructions; 

a processor coupled to said memory on a bus, said processor operable to perform 
a shuffle operation, said processor comprising: 

a bus unit to receive an instruction from said memory, said instruction to 
cause a data shuffle on at least one of L data elements from a first operand 
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based on a set of L shuffle control elements from a second operand; 

an execution unit coupled to said bus unit, said execution unit to execute 
said instruction, said instruction to cause said execution unit to: 

for each shuffle control element, shuffle data from a first operand 
data element designated by said shuffle control element to an 
associated resultant data element position if its flush to zero field is not 
set and place a zero into said associated resultant data element position 
if its flush to zero field is not set. 

49. The system of claim 48 wherein each shuffle control element is comprised of: 
a flush to zero field, said flush to zero field to indicate whether a data element 

position associated with this shuffle control element is to be filled with a zero value; 
and 

a selection field, said selection field to indicate which first operand data element 
to shuffle data from. 

50. The system of claim 49 wherein each shuffle control element is further comprised 
of a source select field. 

5 1 . The system of claim 48 wherein said instruction is a packed byte shuffle 
instruction with flush to zero capability. 

52. The system of claim 48 wherein each data element is a byte wide, each shuffle 
command element is a byte wide, and L is 8. 

53. The system of claim 48 wherein said first operand is 64 bits long and said second 
operand is 64 bits long. 
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